Cream of the Crop 26

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 26 / Cream of the Crop 26.iso / database / dg53.zip / DGMAN < prev next >

Wrap

Text File | 1997-05-26 | 49KB | 1,126 lines

Release 53 Last change: 970506 Last Document Update: 970506 DG USER COMMANDS DG * NAME dg - Data-Grep. Like grep but for searching a free-form flatfile database, printing the entire records rather than just the lines containing the searched-for phrase. Pronounced "dig" as in "digging out data." * SYNOPSIS dg [-options] srchstrng infile srchstring is optional (& ignored) with -s, -l, -U options srchstring is optional (not ignored) with -a, -A options srchstring may be multiple with -m, -M, -y options srchstring is disallowed with the numeric (-n) option srchstring may be a file- list-of-searchterms - with -f option srchstring may be a file- list-of-updates - with -B option infile may not be wildcarded unless using dgw batch file (below) * DESCRIPTION dg will search a text file for a given phrase and print all "records" containing that phrase to standard output. dg is intended for free form "flat" files of text containing records (multi-line chunks or "paragraphs") separated by a defined delimiter character (default is "*"). The delimiter character must occur at the beginning of a line (but see options). Normally, no useful data should be present ON the delimiter line, as it would be lost on output except under certain options. A "paragraph" mode (-dd option) treats blank lines as record delimiters. Unlike grep, which would report only the specific lines containing the search term (or a fixed number of lines either side of the find), data-grepper will print the entire record in which the search term was found-- or a specified number of key lines for that record. The records of the data file are read in order, and records with "hits" are sent to standard output. Maximum line length is 200 (500 in UNIX versions). Overlong lines on input are tolerated, but split to meet max line length. dg has no limitations as to record length, but the price paid is that it cannot accept standard input from a pipe (the input file is opened twice). The program does not directly support wildcards, nor does it understand all unix "regular expressions." A wildcarded list of files to search may be done using the dgw batch file. The searchstring is normally used literally on the command line, or, by using the -f option, up to 100 searchstrings may be specified in a separate file (100-DOS, 1000-UNIX). Searchstring length is the same as maximum linelength except when using the -f option, where searchstring length is limited to 20. dg searchterm file_to_search dg -f file_of_searchterms file_to_search If the searchterm contains spaces, the searchterm must be enclosed in single (UNIX) or double (DOS) quotes when used on the command line. Depending on options selected, the first 1-9 lines of each record may be treated as key lines. Either the search or the report or both may be limited to these lines. dg -k3K5 searchterm file_to_search will look for searchterm only in the first 3 lines of each record, and print the first 5 lines of records when a match is found. The output normally retains the "*" delimiters, thus becoming a subset of the original data file, ready for further searches. Other single characters can be used as delimiters by specifying a new delimiter on the command line. dg -d# searchterm file_to_search will expect the # sign as record delimiter. The special case ("paragraph mode"): dg -dd searchterm file_to_search will treat blank lines as record delimiters. There is an option for additional, secondary sub-delimiters (which may be blank lines) if your data records are large enough to warrant this. Null records, those with only newlines between successive delimiter lines, are ignored and will not be present on output. Records with whitespace (spaces, tabs) are not treated as nulls. The original delimiter lines are normally a single "*" character, but may contain dashes or text following the delimiter character. The extra characters are treated as "comments" NOT to be printed on output (unless the -R option is named). Indeed, a whole series of delimiter- prefixed lines may be included in the master file as "comments" or documentation, not to be printed on output. --------------------------------------------------------------------------- * OPTIONS (in order of average overall usefulness) -C Case sensitive (default is case insensitive). -c Count only. Report the number of records containing the search phrase. -v inVert sense. Report records not containing the search phrase. -d$ Use $ or other following char as Delimiter Exception: Use -dd (yes - lower case d repeated) and the system will treat a blank line as the delimiter for search (sort of like considering paragraphs as records). Output will, however, insert the standard "*" delimiter. -dd Paragraph mode. Blank lines are record Delimiters. True blank lines only- no spaces or tabs. See -d above. -k[n] Search in Keyword-lines-only. Declare the first n (1-9) lines as keyword-lines. Default 1st line only. -K[n] Limit output report to Keyword-lines-only. The first n (1-9) lines are Keyword-lines. Default 1st line only. n for -k and -K options limited to 1 digit. They are independent, and can be used together. -s Status of data file: give record count only. If used with -V option, reports misc. file data. Ignores null records. (Reports them if verbose). (overrides other options on the command line) -l Treat aLL records as hits. No searchterm needed. Useful with -D, -a, -A, -K options. dg -Kpl will give an undelimited list of first (key) lines. dg -Kplh# gives a similar list of major/minor keylines for files with major (*) records and sub(#)records. Note- Other uses of -l and -h together are not recommended. -w Match only on Words. (phrase bound by spaces or line boundary or any non-alpha, non-digit) Underscore (_) is treated as part of a word. -u Like -w but Underscore (_) is treated as a word delimiter (as if whitespace) as well. -x Search phrase is found even if it crosses over a line boundary (X-over). One-line crossover only. Ignores trailing but not leading spaces on lines. Best when used with -T to ignore leading spaces too. Note: trailing hyphens are also ignored so that normal word hyphenation is dealt with. -f Get searchstrings from File. Use filename to replace search phrase on the command line. Leading and trailing spaces in the file of phrases are stripped. For DOS, number of searchstrings in the file is limited to 100 *and* a 20 character string size limit is imposed. Finds are reported in the order they occur in the data file, not the order of the file of terms. (use batch files/unix scripts if you must extract records in other than data file order.) -F[n] The searchterm must be found in Field n of a line to be considered a hit. Incompatible with -mMxnUD and ^$ usage. A field of a line is defined as in a default awk usage-- words or terms separated by whitespace, with leading/trailing whitespace ignored. Use -F with no numerics to indicate the last field of a line regardless of the number of fields there. See extended discussion below. -L Affects -F option. Lax enforcement of field numbers and lengths. See extended discussion below. -m[n] Expect n Multiple search terms on the command line, each of which must be present on_a_single_line in a record to cause a find. If n is omitted, n=2. Incompatible with -v. Max n is 9. If the searchterms are identical, 1 hit suffices. If used with -x, finds must be within about 1 line of each other. -M[n] Expect n Multiple search terms on the command line, each of which must be present somewhere_in_the_record to cause a find. If n is omitted, n=2. Incompatible with -v. Max n is 9. -E[n] Look for Extras-- expect 1 search term on the command line, and report records having that term on at least n separate lines. If n omitted, n=2. Incompatible with -v. Looks for Extras. -p Plain output. Do not print the delimiter on output. Exception: with -y, kills only the sub-separator line. -e Exact whole-line match required to cause a find. -T Ignore L & R (lead/Trail) spaces on all lines. Useful with -e or -x -Q Quit on first find of term; on first find of *each* term when used with -f. Useful with files that redundantly repeat records, e.g. expanded procedural flows. If more than one -f term is found in a record, all are satisfied by printing that record. Do not confuse this with -m or -M searches. The -Q option then will quit on the first find satisfying the -m or -M condition. -W Print only the record numbers where the finds occur. ("Which" records?) -h$ Use $ or other following char as an added, secondary "Helper" delimiter. The secondary delimiter will be recognized whether in the first or second position on a line. Output will be preceded by the first line of the main record, and the phrase: "PARTIAL RECORD:" Not compatible with -x option. Exception: Use -hh or terminal -h with no character specified, and the system will treat a virtual blank line (true blank lines, or lines with only spaces/tabs) as a secondary delimiter for search (sort of like considering paragraphs as sub-records within explicitly delimited records). Output will, however, insert the standard "*" delimiter. Example: dg -hhCF1 -h dgman will give help on the -h option of dg. -Dfname Divide(distribute) output: Write records found to files fname0001, fname0002... one file for each find. Supported ONLY as last option in the option list. Limited to 9999 output files. -n#[...] Get record by Number, e.g. -n456 = get 456th record Compatible ONLY with -vKqod$... NOT with -aA Null records are ignored when counting. Supported ONLY as last option in the option list. A syntax of -n#[#####],#[#####] is supported to retrieve a range of record numbers. Particularly useful when a large file must be divided. -a Print whole data file, Append contents of zzapfile to finds. See discussion below: UPDATING RECORD STATUS -A Print whole data file, Append zzapfile line 1 to keyline 1 of finds. See discussion below: UPDATING RECORD STATUS -j Affects -a, -A options- don't print whole file, but Just the records with finds. -J Affects -a, -A options- tacks a "Jumped" record number onto "found" records. -r Print the delimiter followed by dashes (like a Ruler line) to enhance visual separation of records. -R Retain content of original delimiter lines. The default is to drop additional characters following the delimiter. (The default permits the delimiter line to contain "private" file documentation.) -B Fold-in Big data updates. Allows automated updates of large record sets based on a file of update directions. Highly useful but only in limited circumstances. See discussion below. -U Uniqify a set of records. Directs deletion of repeated records based solely on the last field of the first keyline. Limited filesizes except in UNIX versions. See extended discussion below. -V Verbose. Show prefatory/summary remarks. Use with -s for datafile status report. Use -Vq with dgw batch file to record filenames searched. -H Emphasize the line in the record where the search conditions were met. Prints markers (happy faces if in DOS) at beginning of the "Highlighted" line. Seldom needed, but can be helpful when individual records are long. -N Print a Negative message if no records are found. Normally, there is no output when there are no finds. -o Null argument. Does nOthing. Useful from some batch files/scripts. -G A Grep-like option. Only the lines with the match are printed. Use only if a real grep is unavailable. No REGEXP, but usable with the following options: -w, -u, -c, -v, -T, -C, -e, -f, -m, -F, -N, ^$ Not usable with -k,-K,-x,-D,-Q,-y nor with most other options that are record-oriented. Inappropriate options are not all trapped, but generally have no effect. -y The grep-rest option. Unrelated to -G. "digs" for a record, Yet greps it too. Usable with -K such that IF a record is a "hit" the -K keylines are printed, and followed by any remaining lines in that record that contain one of a set of other searchterms. I.E.- print the keylines of finds and grep the rest of the record for other searchterms. The -m option and syntax must be used, but the the FIRST term given in -m syntax becomes the SOLE record searchterm and all OTHER -m terms become what we grep for after the keylines. Example: dg -ykK2m3r gold melt boil elements will print the 1st 2 keylines of records in the file "elements" having "gold" in the 1st keyline, and then print any remaining lines in the record having the terms "melt" or "boil". Use with the -r option for best visual separation of resultant records. -I Ignore delimiter if repeated in place 2. i.e., if a line begins with ** then Treat it as just a text line, not a delimiter line. Useful with certain originals when you don't want to clean them up first. -S Add delimiters (Stars) to a file. A delimiter line is added _before_ each line containing the search term. Use -Sf and a file of searchterms when appropriate. -P Add delimiters (Post-stars) to a file. Like -S, but delimiters are added _following_ each line that is a hit. -q Quiet. No extraneous prefatory/summary remarks (default, but retained for historical reasons). Exception: Use -Vq with dgw batch file to record filenames searched. -i[n] Recognize an Indented delimiter anywhere in the first n characters (1-9) of a line. Useful in delimiting code files when the delimiter must reside inside a comment, e.g., /* (c) , //* (c++) , #* (unix), ;* (lisp) , REM * (dos) Especially useful with -T to kill leading whitespace for files that have extensive indentation schemes. Thus a -Ti option allows #* to work with any amount of leading whitespace. -Z[Z][1] FuZzy searches-- Look for approximate matches. The -Z option uses a SOUNDEX algorithm that assumes the first letter of every word is unfuzzy. Use the -ZZ option to fuzz even the first letter, e.g., batter with a searchterm of "patter", but expect lots of false hits. A Z1 option uses a stem algorithm that might find "silliness" when you search for "silly". All three fuzzy approaches are desperation moves, sometimes laughable. You may need the -H option to figure out which line caused the hit. Expect "fuzzy" to be more like "hairy" or even "wooly" most of the time. The SOUNDEX approach is an old classic, which gives decent results when you must search with names or commonly misspelled words such as nuclear and personnel, but expect lots of extra drivel as well. Only the first few syllables are checked. If you're curious, you can inspect the kind of coding produced for any searchterm by adding a -N option using a file you know will NOT produce a match. The "not found" report will show the soundex or stem code of the searchterm. Alternatively, add a -V verbose option and wade through the whole mess. Expect junk results if you use small searchterms, numeric searchterms, or searchterms that include spaces or punctuation. The -w option is disallowed. Although only words are really treated, there can be no guarantee of a true wordmatch. The -e option is allowed, but a hit indicates exactness only in the coding string, not in the actual text. All fuzzy searches are automatically case-insensitive. -O Show PrOgress-- when working very large files, print some sign of life every 1000 lines to screen only. ^$ These are not command line options, but implied options nonetheless. Though full unix regular expressions are not supported, the ^ and $ expressions are: dg ^foo filename means look for "foo" at the beginning of a line. Similarly: foo$ means foo at the end of a line \^foo means search for literal "^foo" foo\$ means search for literal "foo$" Note that a search for ^RAT$ is designed to succeed on "RATCELLAR WITH RAT" Use -e for the unix sense of ^RAT$ where the intent is SOL-phrase-EOL. --------------------------------------------------------------------------- * USAGE: General This utility is not designed to replace full featured databases with formal query languages. It is suitable for keeping utility files, such as address or contact files or software requirements files, when the purpose of the search is not to settle just for individual lines containing the desired phrase, but to get the entire paragraph or record. It is like grep with some notion of context. It is useful from the command line, but most powerful when used in batch files that grab a set of records and then do further processing on them. While dg is oriented to asterisk-delimited text files, any single- character delimiter can be used, including blank lines. Given a data file of simple paragraphs separated by blank lines, dg can behave as if the blank lines were the "asterisk" delimiters: dg -dd searchterm datafile The output will be asterisk-delimited, unless you add the -p (plain) option. The blank lines must not have hidden spaces or tabs, unless you use the -T option (trim lead/trail spaces/tabs) option as well. ---------------------------------------------------------------------- * USAGE: Null Records Null Records: A record is "null" if it has no bytes or only line-ending bytes. Null records are ignored for output, and when counting to find an Nth record. Null Keyword lines. The -sV option will report records that have no data on keyword lines. ---------------------------------------------------------------------- * USAGE: With awk and grep in scripts dg it was originally designed to work in concert with awk in scripts or batch files-- working against initially unformatted text files. An aside: if you are not familiar with awk, you are missing one of the best tools available for manipulating text files. Get a copy of Rob Duff's awk or the GNU gawk for DOS. It has almost all the power of PERL, but when you read an awk scipt six months after you've written it, you'll understand it. PERL is best only for folks who will use it every day. Given a master file of records without explicit delimiters, an easily designed awk script can place delimiters at appropriate places in a temporary copy of the original file using either a simple or fairly sophisticated set of guidelines. dg is then used to do searches on the temporary file. If the master is updated, the awk script is re- run to update the temporary file. The dg -S or -P option can be used instead of an awk for very simple cases. Or-- if you simply want to trade blank lines for "*" delimiters, use a -ddl option; the output will have "*" delimiters where the blank lines were. If your primary data is in a commercial database, you may find it useful to dump a subset of the database to a delimited ASCII file. Then, for the rest of the day, you can dig at it with dg, directly or from scripts, without needing to keep the (potentially memory-hogging or licensed-user-limited) database software running. ---------------------------------------------------------------------- * USAGE: If records have labeled lines dg is powerful when used with grep against data files designed to have a number of labeled lines or "slots" in each record. With a file such as: NAME: John Jones PHONE: 999-9999 UNIT: T-44 EXPER: C, C++, aerodynamics ASSIGN: Rufus GUI modules DUE: JAN 95 * NAME: Jane Smith PHONE: 999-8888 UNIT: T-55 EXPER: LISP, scheduling, traffic flow, NL ASSIGN: Rufus NL interface DUE: FEB 95 * a command line or script call such as: dg smith filename | grep EXPER would yield: EXPER: LISP, scheduling, traffic flow, NL or-- dg -ykKm3 smith exper assign filename would yield: NAME: Jane Smith ................... EXPER: LISP, scheduling, traffic flow, NL ASSIGN: Rufus NL interface whereas: dg smith filename alone would print smith's entire record. ---------------------------------------------------------------------- * USAGE: AND searches One can use a script that searches for records with the first term, redirecting output to a temporary file-- which is then searched for records with the second term dg phrase1 filename > temp dg phrase2 temp Alternatively, use the -M option: dg -M phrase1 phrase2 filename dg -M4 phrase1 phrase2 phrase3 phrase4 filename If the search is intended to "AND" multiple phrases on a _single_ line, use the -m option. This is particularly useful when, e.g., you want to find records containing "DEC" but only if the "DEC" is on a HARDWARE line and not on a MONTH or DATE line. dg -m DEC HARDWARE filename ---------------------------------------------------------------------- * USAGE: OR searches Put the set of searchterms if a file and use the -f option. ---------------------------------------------------------------------- * USAGE: Searching Multiple Files There is no provision for wildcards in the datafile name. Each datafile must be searched individually. (Use awk to create a script that calls dg against each of a list of datafiles.) Alternatively, use the following batch file or its unix equivalent. Note that the results are always written to a file named ztempx, in the current working directory. The batch file will complain if you try to search ALL (*.* or *) files in the current working directory since that would include the output ztempx file. If you must search ALL files in a directory (* or *.*), do so from a higher level directory. dgw -o srchterm asubdir/*.* will work fine. The dgw output will not name the file where finds are found, unless you include a -Vq option. If you do that, dg will insert a record naming each file searched. To clean out those advisories, just use dg again with dg -kv "Searching file:" ztempx to get a "clean" set of results. ---------------------cut here ----------------------------------- @ECHO OFF rem bat file to use dg with wildcarded list-of-files-to-search rem usage dgw -dg_arguments searchterm [*.txt or ad.* etc.] rem note: a dg argument must be given, at least -o (a do-nothing) rem note: always writes result to file named ztempx (and sends to more) rem note: thus ztempx must not be in the scope of the wildcard IF "%1" == "" GOTO helps IF "%3" == "" GOTO error ECHO ======Executing the command dg %1 %2 %3 %4 %5 %6 ECHO ======Will overwrite ztempx ECHO ======RETURN to continue, ctrl-C to quit PAUSE IF EXIST ztempx DEL ztempx>NUL rem if touch program not available, use: @REM redirect_to ztempx rem touch ztempx @REM >ztempx rem current setup for 6 total args: supports, e.g., up to -m3 rem its possible to send 11 arguments with, e.g., -m9 rem expand below to handle 11 if desired if exist %6 goto six if exist %5 goto five if exist %4 goto four if exist %3 goto three :six FOR %%X IN (%6) do if %%X==ZTEMPX goto scope FOR %%X IN (%6) DO COMMAND/C dg %1 %2 %3 %4 %5 %%X >> ztempx goto didsearch :five FOR %%X IN (%5) do if %%X==ZTEMPX goto scope FOR %%X IN (%5) DO COMMAND/C dg %1 %2 %3 %4 %%X >> ztempx goto didsearch :four FOR %%X IN (%4) do if %%X==ZTEMPX goto scope FOR %%X IN (%4) DO COMMAND/C dg %1 %2 %3 %%X >> ztempx goto didsearch :three FOR %%X IN (%3) do if %%X==ZTEMPX goto scope FOR %%X IN (%3) DO COMMAND/C dg %1 %2 %%X >> ztempx goto didsearch :didsearch echo ================ FINDS: =================================== TYPE ztempx | more ECHO =========== Finds placed in file ztempx =================== GOTO end :scope echo ERROR- the wildcard term includes the output file "ztempx" goto paterror :error ECHO dgw error. :helps ECHO dgw is used to do a dg-search against a wildcard list-of-files. ECHO e.g. " dgw -Kp searchterm *.foo " ECHO A dg argument must be used. Use -o for a do-nothing argument. ECHO e.g. " dgw -o searchterm *.txt " :paterror ECHO The output of each search is written to file "ztempx" ECHO Be sure that the wildcard term cannot "see" the file tempx ECHO NAME.* or *.NAM is ok. But be in a separate directory to use * or *.* ECHO e.g., NOT "dgw -o searchterm *.*" NOR "dgw -o searchterm *" ECHO e.g., BUT "dgw -o searchterm subdir/*.*" will work. end ---------------------cut here ----------------------------------- ---------------------------------------------------------------------- * USAGE: Creating Tailored Data Sets from A Master I need to maintain a large set of test datasets. For the actual test, each must be an individual file, but maintenance is much easier if they are all kept in a single master file. Each message is delimited. At test run, a script executes dg with the -D option, creating the individual files of the targeted datasets, before executing the actual tests that will act on the individual files. All datasets include one or more keywords such as "full", others "fullminus", and others "specialcase4"; the keywords indicate the class of test. Depending on need, a dg for the desired keyword produces the tailored test set files. ---------------------------------------------------------------------- * USAGE: Understanding the Multiple Terms Options ( -m, -M, -E, -y ) These options can be confusing, but each has been a lifesaver at one time or another. These examples may help: dg -m foo fum filename - a hit if foo & fum on a single line dg -m3 foo fum fay filename - a hit if all 3 on a single line dg -M3 foo fum fay filename - a hit if all 3 anywhere in a record dg -E3 foo filename - a hit if foo is on at least 3 separate lines in a record The following are not useful searches, but they help explain the behavior when searchterms overlap: dg -M3 foo foo foo filename - will succeed if 1 foo in a record dg -m3 foo foo foo filename - will succeed if 1 foo in a line The -y option needs an assist from the -m option in meeting the command line syntax, but the meaning of terms and behavior are very different. Also, the -y option may be used only with a -K option. dg -yKm3 foo fum fay filename For this case, "foo" becomes the sole searchterm determining whether a record is a hit. For such records, the first keyline is printed (-K), and for the remainder of the record, any lines containing "fum" or "fay" will be printed. The "m" in the options is used only to bring in the searchterms and then its "normal" meaning is ignored. ---------------------------------------------------------------------- * USAGE: Eliminating Dupes In Multiply Appended Results Files. The -U option is generally useful only if you intend to search a large set of records several times, appending each result to a collection file. Naturally this kind of job can result in a final file with quite a few duplicate records. To avoid this, first run dg against the master file (or a copy therof) with an -aJ option to append a record number to the first keyline of each master record. Then run your multiple searches against this modified master, appending the results of all searches to the collection file. Finally, run dg with a -U option to create a uniq'd final version. This option needs to build an array of last-fields "already seen." To limit memory poblems in DOS, no one "last-field" may exceed 10 characters in length, and the total record size to be culled may not exceed 200 records. A simple awk can do the job for tougher cases. ---------------------------------------------------------------------- * USAGE: The -F Field Option The field option is only rarely of use, but very powerful when needed. This option allows you to limit search actions to specific fields of a line. E.G., consider the command: dg -F12k3 Elizabeth records The F12 indicates that only field 12 of any line should be searched for the searchterm. The k3 would further limit the search to only the first 3 lines of any record. A field is defined like a default "field" in awk-- words or terms separated by whitespace, with leading/trailing whitespace ignored. The -F option requires some limitations on the the maximum field length and maximum number of fields per line. For DOS, these limits are 40 and 20. That is, no one field with a length over 40, nor more than 20 "words" in any one line. The option is designed primarily for files in which ALL lines stay within these limits. Any field exceeding the max length will be truncated to "fit" and a warning posted to the screen. Any one line exceeding the max number of fields will cause an error warning and the program will terminate. This behavior can help detect unintended errors in the way the data file was created-- if it was your intent to stay within the limits given. For other cases, you may intend that only certain lines will be "fielded" lines, and others should not be restricted. Use the -L "lax" option to kill the complaints (-LF). Any field past the 20th will just be ignored. Fields exceeding max length are quietly truncated. If -F is used with no numeric attached, the program assumes you intend to search the LAST field of the line. Behavior for this special case will be correct even if the normal max number of fields is exceeded. Total line length limits will still apply. ---------------------------------------------------------------------- * USAGE: Updating Record Status: If there is a need to append one or more new lines to selected records in a master file: -- Put the append text in a file named zzapfile -- Run dg with the -a option. -- The entire file will be sent to stdout with the append text appended to records matching the search text. If there is a need to append a phrase to the main key-word line of selected records: -- As above, but use the -A option. -- The contents of line 1 of the zzapfile will be appended to the 1st line of records matching the search text. Use the -J option along with the -a or -A options to append as described, but inhibit printing of records that do not have a match. ---------------------------------------------------------------------- * USAGE: Updating Records with the -B option The -B option allows one to update certain kinds of record files from a manually (or otherwise) produced update file. This is usable only with files that use line names at the start of each line. Such a file, the tgtfile, might hold records such as: john smith title: staff engineer ssn: 999-999-9999 salary: 44444 hired: 960506 * If one creates an update file, foldfile, such as: ann smith ~salary: 45555 john smith ~salary: 77777 pat kelly ~salary: 33333 john smith ~ssn: 888-888-8888 Then the command: dg -kB foldfile tgtfile > zz would update john (and ann's and pat's) salaries, as well as john's ssn, leaving other records untouched. If john did NOT have a salary line, it would be appended as a new line in his record. You can update multiple elements about john from a single update file. (The updated results are in file zz; dg never changes the original record file.) The option assumes you will use unique key terms that will be found only once in the designated number of keylines. For example, an update file such as smith ~salary: 45555 john ~ssn: 777-777-7777 pat ~salary: 33333 will update john smith's ssn or his salary but not both. If "john smith" were used instead, both lines would be updated. This is important to understand, especially if you tell the search to continue over more than one keyline (by using, e.g., -k3). If a valid hit is found in, say, keyline 2, then actions will be taken based on that hit-- and only based on the first hit in that line. If you expected additional actions based on a second possible hit in keyline 2 -- or a separate hit in keyline 3-- you will be disappointed. The search looks no further than the first hit. The line title (e.g., ~salary) is always case sensitive. Use of the -C option will make searches for the key term (e.g., john smith) also case sensitive. Note that line title in the update line should be identical to the one that is used in the file of records if you want to preserve the original line name. Otherwise the updated record will take on the line title provided in the update file. For example, john ~STATUS OK -- will find and replace "STATUS----: BAD", but the new line would be "STATUS OK" not "STATUS----: OK". Limitations: The file will look for "john" only in the keylines specified. You must use the -k[] option with the -B to designate how many keylines are to be searched. A maximum of 20 data elements about john can be used in the file of updates. Data in keyline 1 cannot be changed. In general, don't try to overwork this option. Its fine for limited cases. For more complex work, use awk. ---------------------------------------------------------------------- * USAGE: Non-ASCII Documents dg works ONLY with ASCII files. If you are keeping your master records using document publication software, files saved will normally be in other than pure ASCII. All need not be lost. Most such software allows saving a pure ascii version as well. I've had to keep quite a few documents in ready-to-publish form using Framemaker or Interleaf. Whenever edits are made, I simply create an extra, updated ASCII version as well, either directly or using an awk to strip out the formatting and graphics in the extra copy. ---------------------------------------------------------------------- * USAGE: Help Typing dg with no arguments provides some cryptic help. Use the dgh.bat batch file or an equivalent unix script to get a bit better help on a particular option. Replace the "\dg\dgman" with your own path to the dgman file. With this batch file in your path, enter: dgh -y to see the part of dgman that describes the -y option. ALternatively, use the dghh.bat file (or equivalent) to get help using a likely keyword: dghh dash will show help on the -r ruler line option. Replace the "\dg\dgops.sam" with your own path to that file. Note: The dgman and dgops.sam files contain embedded spaces on certain apparently blank lines to keep certain help sections together when using the -dd option. For example, see the help for -y. ---------------------------------------------------------------------- * USAGE: Option Confusion You can come up with a lot of different option combinations using dg. When you get some combination that does what you want, put it in a batch file or an alias command. Let the computer do the remembering. The dg.bat file shown above is a good example of usage. ---------------------------------------------------------------------- * BEHAVIOR: Treatment of Punctuation DOS BEHAVIOR: In a searchterm, <>| must be quoted. The ; and " symbols can be in a searchterm only if using the -f option. A backslash (\) may be in a searchterm but must not immediately precede a double quote ("). The % symbol can be in a searchterm only if using the -f option or using the command line directly; from a DOS batch file, the % symbol would be lost. UNIX BEHAVIOR: Generally less silly. If you must include punctuation in a searchterm, you may or may not need to use single quotes around the term. Experiment. ---------------------------------------------------------------------- * BEHAVIOR: "WORDSEARCH" (-w) A searchterm "hit" meets -w wordmatch criteria as long as the hit: -- is bound on left by: SOL, non-alpha, non-digit, non-underscore -- is bound on right by: EOL, non-alpha, non-digit, non-underscore (except with -U option, where underscore IS treated as wordbreak) A "word" bounded by punctuation remains a word. Thus !@#$wow#$&% will qualify in a wordsearch for "wow". SOL means "Start of Line" and EOL is the end. ALSO-- digits or punctuation INSIDE the searchterm do not disqualify it as a "word." For example, "walla6*%^walla7" can be a word "hit" for the "word" "walla6*%^walla7" since wordmatch only checks the area left & right of the "hit." ---------------------------------------------------------------------- * BEHAVIOR: UNPRINTABLE CHARACTERS: The program is not designed to deal with control characters nor high- bit ASCII above 127 in the text nor the searchterms. Consider the behavior unpredictable when these are present. ---------------------------------------------------------------------- * EXAMPLES --- Random Files: The simplest example uses are for keeping randomly organized address or contact records, system/software requirements statements, multi- line quotations or references, mini-help files, to-do files, scheduled appointments, descriptions of hobby collectibles, recipes, or simply random ideas. Just separate all "data chunks" with a "*" delimiter. --- Unordered ASCII Documents (Paragraph Mode): Any documentation you reference often, or need to extract from, can be dg-searched to get the right "paragraphs" to standard output, even if the only delimiters are blank lines. dg -dd phrase filename Will provide all paragraphs holding the phrase, and dg -vdd phrase filename Will provide all paragraphs NOT holding the phrase. --- Master Documents with Summary Lists (Advanced usage) Often a master document of, say, software trouble reports, can be delimited and then searched for related topics. The master may be a manually-maintained one or the ASCII result of a database query. Item: A22 Block: Control Date: 940522 Short_Title: Foo fum Description: This item is not performing correctly whenever the input is made on a Tuesday before 2:15 PM. Behavior normal at all other times. Supply Data: Acme Co. and Ray-Bolixer Inc. Priority: 3 Asssigned: Joe Due 940610 * Item: A33 etc. * Perhaps you've created a priorities listfile from it, that you use to keep track of the big picture: Item Block Date Short_Title Priority Assigned to Due A22 Control 940522 Foo_fum 3 Joe 940610 A33 Charts 940527 Fum_fee 2 Jane 940616 A44 A-Object 940528 Fee_fie 4 Joe 940622 A55 Control 940530 Fo_fum 1 Dan 940604 Assuming the Block designation is in one of the first, say 3, keylines of each record, try this: gawk '$2=="Control"{print "dg -k3 " $1 " > temp2.bat"}' listfile > temp1.bat call temp1 (creates temp2.bat) call temp2 The awk creates a batch file of dg calls that will give you the details on the Control block problems. Of course, if the records are in a full-fledged database, you could query it directly. The dg approach is primarily of value in batch files/scripts-- especially if the data source is not worth entering or maintaining in a full fledged database system. ---------------------------------------------------------------------- * SEE ALSO grep, awk, sed ---------------------------------------------------------------------- * BUGS & LIMITATIONS Max line length in input file: 200 (500 in unix versions) Max searchstring length: 200 (500 in unix versions) Max num of searchterms in -f searchfile: 100 (1000 in unix versions) Max searchstring length with -f: 20 (500 in unix versions) Max number of fields for -F options 20 (100 in unix versions) Max field length for -x, -F options 40 (100 in unix versions) Maximum number of records when killing dupes: 200 (500 in unix versions) Overlong lines on input are tolerated, but truncated as far as the search is concerned. Error management for insensible option combinations is provided only for the most common mismatches. LIMITATION: Use of a file of searchterms: The -f option provides results only in the order that hits occur in the original data file. Getting results in the order listed in the list of search terms could be useful. Use the following awk & script approach as a workaround: Create the list of search terms as "srchtrms" Use of dg -f srchterms datafile would give results in the order of the terms found in datafile. To get results in the order of terms in srchterms, use awk -f thisawkfile srchterms > temp.bat Run the resulting temp.bat file. Results will be in "results.txt" thisawkfile: BEGIN{ #assuming the datafile is "datafile" #and file of searchterms is "srchterms" datafile= "datafile" #following for dos, use 39 (') for unix q = sprintf("%c",34) print "del results.txt" #unix only: print "touch results.txt" } #main {print "dg -k " q $0 q " " datafile " >> results.txt" #above commands are printed to temp.bat when #called as awk -f thisawkfile srchterms > temp.bat } LIMITATION No Pipe capability TO dg: The inability to accept input from a pipe can be annoying, but was a tradeoff for efficiency. The program opens the input file twice using the first file tag to do the searching, with the second tag playing a "follower" role to print records that are "finds." This approach avoids the need for large memory allocations, thus allowing unlimited record lengths. Unfortunately stdin cannot be "opened twice," thus piping the output of other commands to dg has been sacrificed to avoid record size limitations. OUTput from dg can be redirected through a pipe. Bugs: Certainly. Let me know what you find. -- Pete Marikle ---------------------------------------------------------------------- *